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Abstract 

Sequential Monte Carlo (SMC) methods, such as the particle filter, 
are by now one of the standard computational techniques for addressing 
the filtering problem in general state-space models. However, many ap¬ 
plications require post-processing of data offline. In such scenarios the 
smoothing problem—in which all the available data is used to compute 
state estimates—is of central interest. We consider the smoothing prob¬ 
lem for a class of conditionally linear Gaussian models. We present a 
forward-backward-type Rao-Blackwellized particle smoother (KBPS) that 
is able to exploit the tractable substructure present in these models. Akin 
to the well known Rao-Blackwellized particle Hlter, the proposed RBPS 
marginalizes out a conditionally tractable subset of state variables, effec¬ 
tively making use of SMC only for the “intractable part” of the model. 
Compared to existing RBPS, two key features of the proposed method are: 
(i) it does not require structural approximations of the model, and (ii) 
the aforementioned marginalization is done both in the forward direction 
and in the backward direction. 


1 Introduction 

State-space models (SSMs) comprise one of the most important model classes in 
statistical signal processing, automatic control, econometrics, and related areas. 
A general discrete-time SSM is given by 

Xt+ipixt+i\xt), (la) 

yt^piyt\xt), (lb) 

where Xt € is the latent state process and yt € M"" is the observed mea¬ 
surement process (we use the common convention that p denotes an arbitrary 
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Reasoning over Time (Reference: EP/K020153/1), funded by the EPSRC. 
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probability density function (PDF) induced by the model Q, which is iden¬ 
tified by its arguments). When the model is linear and Gaussian the filtering 
and smoothing problems can be solved optimally by using methods such as the 
Kalman filter and the Rauch-Tung-Striebel smoother, respectively (see, e.g., 
m)- When going beyond the linear Gaussian case, however, no analytical 
solution for the optimal state inference problem is available, which calls for 
approximate computational methods. 

Many popular deterministic methods are based on Gaussian approximations, 
for instance through linearization and related techniques. An alternative ap¬ 
proach, for which the accuracy of the approximation is limited basically only by 
the computational budget, is to use Monte Garlo methods. Among these, se¬ 
quential Monte Garlo (SMG) methods such as particle filters (PF) and particle 
smoothers (PS) play a prominent role (see, e.g., [lOl [T5] b 

While SMG can be applied directly to the general model Q, it has been 
recognized that, in many cases, there is a tractable substructure available in 
the model. This structure can then be exploited to improve the performance of 
the SMG method. In particular, the Rao-Blackwellized PF (RBPF) [IHIIZ] has 
been found to be very useful for addressing the filtering problem in conditionally 
linear Gaussian (CLG) SSMs (see Section]^. As pointed out in [6], CLG models 
have found an “exceptionally broad range of applications”. 

However, many of the applications, as well as system identification, of SSMs 
rely on batch analysis of data. The central object of interest is then the smooth¬ 
ing distribution, that is, the distribution of the system state(s) conditionally on 
all the observed data. While there exist many SMG-based smoothers (see e.g., 
[23] and the references therein) variance reduction by Rao-Blackwellization has 
not been as well explored for smoothing as for filtering. 

In this paper, we present a Rao-Blackwellized PS (RBPS) for general CLG 
models. The proposed method is based on the forward filter/backward simulator 
(FFBS) [12 . Contrary to the related forward-backward-type RBPS used by 
|19| . the proposed method does not require any structural approximations of 
the model. Another key feature of the proposed method is that it employs Rao- 
Blackwellization both in the forward and backward directions, as opposed to m 
who sample the full system state in the backward direction. The use of Rao- 
Blackwellization also in the backward direction is necessary for the smoother 
to be truly Rao-Blackwellized. An alternative RBPS, specifically targeting the 
marginal smoothing distribution, which is based on the generalized two-filter 
formula is presented in |3|. 

This contribution builds upon two previous conference publications, m and 
|22| , where we studied two specific model classes, hierarchical models and mixed 
linear/nonlinear models, respectively (see the next section for definitions). Fur¬ 
thermore, independently of m, Whiteley et al. [32] have derived essentially the 
same RBPS for hierarchical models as we present here, although they study ex¬ 
plicitly the special case of jump Markov systems. The present work goes beyond 
[171112 on several accounts. First, the techniques used for the derivations and, 
as an effect, details of the algorithmic specifications in these two proceedings 
differ substantially. Here, we harmonise the derivation and provide a general al- 


2 


gorithm which is applicable to both types of CLG models under study. We also 
improve the previous results by extending the method to more general models. 
Specifically, we allow for correlation between the process noises entering the 
conditionally linear and the nonlinear parts of the model, and rank-deficient 
process noise covariances in the conditionally linear parts. This comprises an 
important class of models in practical applications m- Finally, we provide 
several extensions to the main method (Section]^ that we view as a key part 
of the proposed methodology. 

For a vector fi and a positive semidefinite matrix n ^ 0, we write ||/r||Q = 
We write \ A\ for matrix determinant and A/'(^, S) and Af{x; /r, S) for the 
Gaussian distribution and PDF, respectively. 


2 Conditionally linear Gaussian models 

Let the system state be partitioned into two parts: Xt = {ut,zt), where Ut G 
is referred to as the nonlinear state and Zt € K"* is referred to as the linear 
state. The SSM 0 IS said to be CLG if the conditional process ^z^j | 
follows a time-inhomogeneous linear Gaussian SSM. For concreteness, we will 
study two specific classes of CLG models, which are of particular practical 
interest. However, by combining these two model classes the proposed method 
can straightforwardly be generalized to other CLG models. 

Model 1 (Hierarchical CLG model) A hierarchical CLG model is given by, 


Ut+i ^ p{ut+i\ut), (2a) 

Zt+i = f{ut+i) + A{ut+i)zt + F{ut+i)vt, (2b) 

yt = h{ut) + C{ut)zt + et, (2c) 

with process noise Vt A/'(0,/„^) and measurement noise et ~ JV{0, R(ut)), 
respectively, where R{ut) is a positive definite matrix for any Ut € K"". 

Model can be seen as a generalization of a jump Markov system, in which 
the “jump” or “mode” variable Ut is allowed to be continuous. However, the 
hierarchical structure of Model can sometimes be limiting. We will therefore 
study also the following model class. 


Model 2 (Mixed linear/nonlinear model) A mixed linear/nonlinear CLG 
model is given by. 


Ut+l 


9{ut) 

1 

B{ut) 

Zt + 

'G{ut) 



f{ut) 


y4(ut)_ 

F{ut)_ 


yt = h{ut) + C{ut)zt + et, 


(3a) 

(3b) 


with process noise Vt A/'(0,/n^) and measurement noise et ~ JV{0, R{ut)), 
respectively, where Q{ut) = G{ut)G{utY o,nd R{ut) are assumed to be positive 
definite matrices for any ut G M"". 
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Note that Model allows for a cross-dependence in the dynamics of the 
two state-components, that is Ut+i depends explicitly on zt and vice versa. 
Mixed linear/nonlinear models arise, for instance, when the observations depend 
nonlinearly on a subset of the states in a system with linear dynamics. See m 
for several examples from target tracking where this model is used. 


Remark 1 We do not assume that F{ut)F{ut)~^ is full rank, that is, it is only 
the part of the process noise that enters on the nonlinear state Ut that is assumed 
to be non-degenerate. Note also that the model (3a) readily allows for correlation 
between the components of the process noise entering on the nonlinear state and 
on the linear state, respectively. 


It is worth to emphasize that Model [T] is not a special case of Model 
since p(ut+i Iwj) may be non-Gaussian in \2a\. Nevertheless, Model is sim¬ 
pler than Model in many respects. Indeed, one reason for why we study 
both model classes in parallel is to more clearly convey the idea of the deriva¬ 
tion. This is possible since we can start by looking at the (simpler) hierarchical 
CLG model, before generalizing the expressions to the (more involved) mixed 
linear/nonlinear model. 


3 Background 

3.1 Particle filtering and smoothing 

Consider hrst the general SSM ([^. A PF is an SMC algorithm used to approx¬ 
imate the intractable filtering density p{xt \ yi.t) (see e.g. [lQl[T5]). Rather than 
targeting the sequence of filtering densities directly, however, the PF targets 
the sequence of joint smoothing densities p{xi.,t \ yi-t) for t = I, 2, .... This is 
done by representing p{xi.,t \ yi-t) with a set of weighted particles {x\.^,wl}f^i, 
each of which is a state trajectory xi.,t. These particles define the point-mass 
approximation. 


N 

2 = 1 


(4) 


where Sx denotes a Dirac distribution at point x. In the simplest particle hlter, 
the t-th set of particles are formed by sampling Xi-^-i from the previous distribu¬ 
tion (resampling) and then Xt from an importance distribution q{xt \ Xi-t-i,yt). 
A weight is assigned to each particle to account for the discrepancy between the 
proposal and the target density. The importance weight is given by the ratio of 
target and proposal densities, which simplihes to. 


wt{xi.,t) oc 


Pjyt I xt)p{xt I xt-i) 

q{xt I xi,t-i,yt) 


(5) 


Note that an approximation to p{xt \ yi-.t) is obtained by marginalization of (|^, 
which equates to simply discarding x\.f_i for each particle i = 1, ..., N. 
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The term “smoothing” encompasses a number of related inference problems. 
Basically, it amounts to computing the posterior PDF of some (past) state 
variable, given a batch of measurements j/u- Here we focus on the estimation 
of the complete joint smoothing density, p{xi:T \ Ui-.t)- Any marginal smoothing 
density can be computed from the joint smoothing density by marginalization. 

In fact, the joint smoothing distribution is approximated at the final step 
of the particle filter |2()j . However, this approximation suffer from the problem 
of path degeneracy, that is, the number of unique particles decreases rapidly 
for t < T [m [ID]. To mitigate this issue, a diverse set of particles may be 
generated by sampling state trajectories using the forward filtering/backward 
simulation (FFBS) algorithm [ID]. FFBS exploits a sequential factorization of 
the joint smoothing density: 

T-l 

p{xi:T I VI-.t) = p{xT I Vi-.t) I VI-.t)- (6) 

t=l 

At time T, a final state xt is first sampled from the particle filter approximation 
p^ {xt I Ui-.t)- Then, working backward from time T, each subsequent state Xt 
is sampled (approximately) from the backward kernel, p{xt \xt+i-.T^ Ui-.t)- The 
resulting trajectory Xi,t is then an approximate sample from the joint smoothing 
distribution. 

Using the Markov property, the backward kernel may be expressed as 

p{xt I Xt+i:T, Ut.t) oc p{xt+i I Xt)p{xt I UT.t)- (7) 

By using the PF approximation of the filtering distribution, we obtain the fol¬ 
lowing point-mass approximation of the backward kernel: 

N 

P^ {xt\xt+i,T,U i-.t) = ^ (8) 

i=l 

with wl^j- oc wlp(xt+i I x\). The FFBS algorithm samples from this approxima¬ 
tion in the backward simulation pass. 

Typically, we repeat the backward simulation, say, M times. This gener¬ 
ates a collection of backward trajectories {x\.j-}^^i which define a point-mass 
approximation of the joint smoothing distribution according to, 

1 

P^{xi,T I Ui-.t) - (xi-.t)- (9) 

From this, any marginal or fixed-interval smoothing distribution can be approx¬ 
imated by simply discarding the parts of the backward trajectories which are 
not of interest. 

3.2 Rao-Blackwellized particle filter 

We now turn our attention specifically to CLG models, such as Model or 
Model The structure inherent in these models can be exploited when ad¬ 
dressing the filtering problem. This is done in the RBPF, which is based on 
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the factorization p(ui:t, Zt \ yi-.t) = p{zt \ unt, yi:t)p{ui,t \ yi-.t)- Since the model 
is CLG, it holds that 


p{zt I ui:t, ynt) = N{zu Zt\t{ui,t), Pt\t{ui..t)) , 


( 10 ) 


for some mean and covariance functions, Zi^tiui-t) and , respectively. 

A PF is used to estimate only the nonlinear state marginal density while condi¬ 
tional Kalman filters, one for each particle, are used to compute the moments 


for the linear state in (10). The resulting RBPF approximation is given by 




t,Zt 


N 

\yi:t) = '^wlAf{zt;zl\t^Pl\t)K\Jui.,t), 


where = Zt\t{u\.f) and Pf^^ = Pt\tiu\.t). The particle weights are given by the 
ratio of p{yt,ut \ ui-,t-i,yi:t-i) and the importance density. See [7] for details on 
the implementation for the hierarchical CLG model and [29] for the mixed lin¬ 
ear/nonlinear model. The reduced dimensionality of the particle approximation 
results in a reduction in variance of associated estimators IlllHj- 

For numerical stability, it is recommended to implement the conditional 
Kalman filters on square-root form. That is, we propagate, e.g., the Cholesky 
factor F 4 |t(Mi,t) of the conditional covariance matrix, rather than the covariance 
matrix itself, where rt|j(ui:t) is such that 

( 11 ) 

See Section [531 


4 Rao-Blackwellized particle smoothing 


We now turn to the derivation of the new KBPS. The method is an FFBS which 
uses the RBPF as a forward filter. The novelty lies in the construction of a back¬ 
ward simulator which samples only the nonlinear state in the backward pass. 
Difficulty arises because marginally (and conditionally on the observations) the 
nonlinear state process is non-Markovian. Practically, this means that the back¬ 
ward kernel cannot be expressed in a simple way, as in ([^. We address this 
difficulty by deriving a backward recursion for a set of sufficient statistics for 
the backward kernel. This backward recursion is reminiscent of the backward 
filter in the two-filter smoothing formula for a linear Gaussian SSM (see e.g., 
[T51 Ghapter 10]). 

The basic idea is presented in Section |4.1[ together with the statement of a 
general algorithm which samples state trajectories for the nonlinear states. We 
then consider the two specific model classes, the hierarchical GLG model and 


the mixed linear/nonlinear model, in Section 4.2 and Section 4.3 respectively. 
In Section |4.4| we discuss how to compute the smoothing distribution for the 
linear states. 
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4.1 Rao-Blackwellized backward simulation 

We wish to derive a backward simulator for the nonlinear process {ut}. That 
is, the target density is p(mi:t I Vi-.t)- However, when marginalizing the linear 
states {zt}, we introduce a dependence in the measurement likelihood on the 
complete history ui-t- As a consequence, we must sample complete trajectories 
produced by the RBPF when simulating the nonlinear backward trajectories; 
see |23l Chapter 4] for a general treatment of backward simulation in the non- 
Markovian setting. To solidify the idea, note that the target density can be 
expressed as 


p{ui:T I Vi-.t) = piui-.t \ Wt+ur, yi:T)p{ut+T.T \ Vi-.t)- 


( 12 ) 


Assume that we have run a backward simulator from time T down to time 
t + 1. Hence, we have generated a partial, nonlinear backward trajectory ut->ri:T, 
which is an approximate sample from p{ut+i-.T \ Vi-.t)- To extend this trajectory 
to time t, we draw one of the RBPF particles (with probabilities 

computed below). We then set Ut-,T = {ul,Ut+i-.T} and discard This 

procedure is then repeated for each time t = T—1, ..., 1, resulting in a complete 
backward trajectory ui-,t- 

To compute the backward sampling probabilities, we note that the first factor 


in (121 can be expressed as. 


p{ui-,t I Ut+i-.T, Vt.t) oc p{yt+i-T, Ut+i-.T I ui-.t,VTt)p{ui-.t I Vi-.t)- (13) 


The second factor in this expression can be approximated by the forward RBPF, 
analogously to a standard FFBS. Similarly to (|^, this results in a point-mass 
approximation of the backward kernel, given by 

N 

P^{ui,t I Ut+i-.T, Vi-.t) = 

2=1 


with 


wl\T OC wlPiVt+l-.T,Ut+i,T\u\.^Vl-.t)- (15) 

We thus employ the following backward simulation strategy to sample ui-,t 
p{ui-.T I Vt.t)'- 

1. Run a forward RBPF for times t = 1, ..., T. 

2. Sample ut with with P(m 7 ’ = ulp) = w\-. 

3. For t = T — 1 to 1: 

(a) Sample Ut with P(rtt = u\) = 

(b) Set Ut-.T = {ut,Ut+i,T}- 


7 



Note that Step 3a) effectively means that we simulate u\.f from the point-mass 
approximation of the backward kernel ( |l4| ), discard and set Ut = u\. 

More detailed pseudo-code is given in Algorithm below. 

It remains to find an expression (up to proportionality) for the predictive 
PDF in (151, in order to compute the backward sampling weights. In fact, since 
the model is CLG, this PDF can be computed straightforwardly by running a 
conditional Kalman filter from time t up to T. However, using such an approach 
to calculate the weights at time t would require N separate Kalman hlters 
to run over T — t time steps, resulting in a total computational complexity 
scaling quadratically with T. To avoid this, we seek a more efficient computation 
of the weights (15). This is accomplished by propagating a set of sufficient 
statistics backward in time, as the trajectory ui-t is generated. Specifically, 
these statistics are computed by running a conditional backward information 
filter for Zt, conditionally on UfT, t = T, T — 1, ..., 1. The idea stems from 
who use the same approach for Markov chain Monte Carlo sampling in 
jump Markov systems. 

To see how this can be done, note that he predictive PDF in (15) can be 
expressed as 


P{yt+1-.T, Ut+l-T I Ui:t, yi-t) 


Jp{yt+i-T, Ut+I-.T I Zt, Ut)p{zt I Ui:t,yi-.t) dzt- 


(16) 


This expression is related to the factorization used in the two-filter smoothing 
formula. The second factor of the integrand is the conditional forward hltering 


density. This density, computed in the forward RBPF, is given by (10). Sim¬ 


ilarly, we can view the first factor of the integrand as the density targeted by 
a conditional backward filter, akin to the one used in the two-hlter smoothing 
formula. 

Indeed, we will derive a conditional backward information filter for this den¬ 
sity, and thereby show that 


p{yt+i-T,Ut+i-.T I zt,ut) oc Ztexp (-i {zj^tzt - 2XjZt)) 


(17) 


where Zt^ h 0 and Xt depend on ut, but are independent of z*, and the 
proportionality is ^vith respect to l^Jote that 0 is not a PDF in zt. 

Still, it can be instructive to think about the above expression as a Gaussian 
PDF with information vector Xt and information matrix flf We choose to 
express the backward statistics on information form since, as we shall see later, 
fit is not necessarily invertible. The interpretation of 0 as a Gaussian PDF for 
Zt implicitly corresponds to the assumption of a non-informative (flat) prior on 
Zt- As pointed out above, this interpretation might be useful for understanding 
the role of the backward statistics, but it does not affect our derivation in any 
way. 

^By proportionality with respect to some variable x, we mean that the constant hidden in 
the proportionality sign is independent of this variable. 







As an intermediate step of the derivation, we will also show the related 
identity, 


p{yt-T,Ut+i:T I Zt,Ut) oc exp (^zjhtZt - 2Xjzt'^'^ 


(18) 


for some ^ 0 and At and where the proportionality is with respect to Zf. Com¬ 
puting ( [I^ given 0 corresponds to the measurement update of the backward 
information filter (the measurement yt is taken into account). Similarly, com¬ 
puting 0 given 0, with t replaced by t — 1, corresponds to a backward 
prediction. 

Assume for now that 0 holds. To compute the integral ( [T^ we make use 
of the following lemma. The proof is omitted for brevity, but follows straight¬ 
forwardly by plugging in the expression for z and carrying out the integration 
with respect to 

Lemma 1 Let ^ ~ and let z = c + Ax + T^, for some eonstant vec¬ 

tors c and X and matrices A and T, respectively. Let ^ 0 and X be a 
constant matrix and vector, respectively. Then E [exp (—f (^z~^Llz — 2A^z))] = 
exp (—iy) with, 


7 — II Ax|| Q_Qpjy^_ipTQ 2a; A (/ LITM T ) 


where m = X — Lie and M = r^flT -|- I. 


-2A'c- lir'ml 


M- 


From (101 and (11) it follows that if we write 
zt = Zt^t + 


6~V(0,/), 


(19) 


then the distribution of Zt in ([T^ is p{zt \ ui.,t,yi,t). In the above, we have 


dropped the dependence on ui:t for brevity. The integral in (16) can thus be 
computed by applying Lemmawith c = Zf^t, a; = 0, T = T^jj, LI = Lit and 
A = At. It follows that. 


p(2/t+l:T,Ut+l:T I Ml:t,?/l:t) (X Zt\At\ exp (-^T^t) 
where the proportionality is with respect to ui.,t and with, 

Vt = ll^t|t||nt “ 2A7zt|f - ||r^((At - LltZt\t)\\\-i, 


At — r^^nthtit 


/. 


( 20 ) 

(21a) 

(21b) 


By plugging this result into (15), we obtain an expression for the backward 
sampling weights. It remains to show the identity 0 and to find the updating 
equations for the statistics {Zt,Llt, Xt}. These recursions will be derived explic¬ 
itly for the two model classes under study in the consecutive two sections. The 
resulting Rao-Blackwellized backward simulator is given in Algorithm As for 
a standard FFBS, the backward simulation is typically repeated M times, to 
generate a collection of backward trajectories which can be used to 

approximate p{ui.T \ Vi-.t)- 
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Algorithm 1 Rao-Blackwellized backward simulator 


1. Forward filter: Run an RBPF for time t = 1, 

2. Initialize: Draw ut with V{ut = u^) = w^. 
according to (231. 


., T. For each t, store 


Compute fix and At 


3. For t = T — 1 to 1: 

(a) Backward filter prediction: 

(Model 1: hierarchical) 

- Compute Zl = p{ut+i | uj) for f = 1, 


N. 


- Compute {fit, At} according to (25). 
(Model 2: mixed) 


- Compute XI} according to (33) for each forward filter 

particle i = I, ..., N. 

(b) For i = 1, ..., N: 

i. Compute {Al,ril} according to ( pT| . 

ii. Compute W) = wlZl\Al\-^/^ exp {-^4). 

(c) Normalize the weights, 

(d) Draw J with P( J = i) = 

(e) Set Uf.T = {ui.ut+xr}- 

(f) (Model 2: mixed) Set {Dt,At} = {n/,A/}. 

(g) Backward filter measurement update: Compute {DtjAt} ac¬ 


cording to (27). 
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4.2 Model 1 — Hierarchical CLG model 


We now consider Model the hierarchical CLG model, and prove the identities 
(17) and (18). We also derive explicit updating equations for the statistics 
{Zt,QtAt} and {rit.AJ, respectively. 


Remark 2 The expressions derived in this section have previously been pre¬ 
sented by who, independently from our preliminary work in \27l , have de¬ 
rived an REPS for hierarchical CLG models. Nevertheless, we believe that the 
present section will be useful in order to make the derivation for the (more 


involved) mixed linear/nonlinear model in Section 4-3 more accessible. 


For notational simplicity, we write At for A(ut) and similarly for other func¬ 
tions of Ut. To initialize the backward statistics at time T, we note that (2c) 
can be written as. 


p{yT\zT,UT) = M{yT\hT+ Ctzt,Rt) oc exp (^z/^tzt - 


( 22 ) 


with 


LIX = CrpRrp CT) 

At = C/Rf/{yT — hr), 


(23a) 

(23b) 


which shows that (18) holds at time t = T (with the convention ut+i-.t = 0)- 


We continue by using an inductive argument. Hence, assume that (18) holds at 
some time t -\-1. To prove 0 we do a backward prediction step. We have, for 
t < T, 

p{yt+V.T,Ut+l-.T I Zt,Ut) 

= piut+i I Ut) j p{yt+i-,T, Ut+2-.T I Zt+i,Ut+i)p{zt+i I Zt, Ut+i) dzt+1- (24) 


Using the induction Wpothesis and (2b), the above integral can be computed 
by applying Lemmawith c = ft+i, A = At+i, x = Zt,T = Ft+i, H = Ht-i-i 
and A = At+i. It follows that ( [^ coincides with ( [l7| ), with 

Zt =p{ut+i \ut), 


— AJ_^_i (^I — Llt+iFt+iM^_^)^F/^i^ rit+iAt+i, 


At — A. 


t+i 


fj - Clt+iFt+iM^j^^F/j^^ rnt+i, 


(25a) 

(25b) 

(25c) 


where we have defined the quantities mt+i = At+i — Ht+i/t+i and Mt+i = 
F)/_iTlt-\-iFt+i + I- Note that the above statistics depend on ut only through 


the factor p{ut+i \ Ut) in (25aI. This is important from an implementation point 
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of view, since it implies that we do not need to make the backward prediction 
for each forward filter particle; see Algorithm 

Next, to prove (18) for t < T, we assume that 0 holds at time t. We have. 


P{yt-.T, Ut+1:T I Zt, Ut) = p{yt I Zt, Ut)p{yt+1:T, Ut+l-T I Zt, Ut). 


(26) 


is given by (17). By collecting terms from the two factors, we see that (26) 


coincides with (18), where 


The first factor is given by (pel, analogously to (23), and the second factor 


+ CjRf ^Ct, 

At = At + Cj Rf ^(jjt — ht). 


(27a) 

(27b) 


As pointed out above, this correspond to the backward measurement update. 
Since we are working with the information form of the backward filter, the mea¬ 
surement update simply corresponds to the addition of a term to the information 
vector and the information matrix, respectively. 

4.3 Model 2 — Mixed linear/nonlinear CLG model 

We now turn to the mixed linear/nonlinear model (§ and prove the identi¬ 
ties 0 and ( [I^ for this class of systems. First, note that the measurement 
equations are identical for the models ([^ and ([^. Consequently, the initial¬ 
ization (23) and the backward measurement update (27) hold for the mixed 


linear/nonlinear model as well. We will thus focus on the backward prediction 
step. 


Similarly to (241 we factorize the backward prediction density according to. 


p{yt+i,T,Ut+i.,T I zt,ut) 

= p{ut+i I Ut,Zt) J piyt+i-.T,Ut+ 2 -.T I Zt+i,Ut+i)p{zt+i I Zt,Ut,Ut+i) dzt+i- (28) 


Note that the first factor now depends on zt- From (3a), we can express this 
density as. 


Piut+i \zt,ut) =M{ut+i;gt + BtZt,Qt) 

(X |Qt|-i/^exp (-i (||ut+i 

X exp (-i - 2 zjBjQ-\ut+i - gt 


)))■ (29) 


Next, we address the integral in (28). Since the process noise Vt enters the 
expressions for both Ut+i and Zt+i in (3a), there is a statistical dependence be¬ 
tween zt+i and Ut+i- In other words, since we allow for cross-correlation between 
the process noises entering on Ut+i and Zt+i, respectively, knowledge about Ut+i 
will contain information about Zt+i- This has to be taken into account when 


computing the second factor of the integrand in (28). To handle this, we make 


12 

























use of a Gram-Schmidt orthogonalization to decorrelate the process noises. Let 
Vf = QfVt, where 

Qt^I-GjQi^Gf (30) 

Note that Qf is a projection matrix: (Qt = Qt- It follows that E[uf (Gut)^] = 
0 and = Qf, We can then rewrite the dynamical equation pa] ) as, 

■^t+i = + BfZt + GtVt, (31a) 

^t+i = ft + AtZt + Ftvl , (31b) 


where 


ft — ft{ut,ut+i) — ft + FtGjQf ^{ut+i — gt), (31c) 

At = At{ut) = At-FtG]Q^^Bu (31d) 


and where the process noises entering on Ut+i and Zt+i are now independent. 


Hence, from (31b), we can write 


p{zt+i I Zt,ut,ut+i) — Af (^t+i; ft + AfZt, FtQfFj ). 


(32) 


The integral in (281 can now be computed by applying Lemma 0 with c = ft, 
A = At, X = Zt, T = FtQt, H and A = At+i. Combining this result with 

(29) and collecting the terms, we see that ( |2^ coincides with 0 with. 


Zt = \Qt\ exp (-Art) , 

fl* = AJ — flt+iFt"i!tFj^ Q,t+iAt + bJQ^ ^Bt, 

Xt = Aj (^I -ht+iFt^tFj^ mt, 


(33a) 

(33b) 

(33c) 


where we have defined the quantities 

Tt = \\ut+i - gt\\l-t + WftWl^^ - 2Xj^Jt - \\F;^m\\l^, 

^t = 

Mt = QtBt ^t+iFtQl + I, 

mt = At+i — Vtt+ijt. 


As opposed to the hierarchical model, the predicted backward statistics {Zj, Ht, At} 
all depend explicitly on ut for this model. This implies that the backward pre¬ 
diction has to be done for each forward filter particle, see Algorithm It 
should be noted, however, that the updating equations ( p^ can be simplified 
for some special cases of the mixed linear/nonlinear model (§. In particular, 
if the dynamics (3a) are Gaussian and linear in both zt and Ut (the measure¬ 
ment equation (3b) may be nonlinear in ut), it is enough to do one backward 
prediction. Models with linear dynamics and nonlinear measurement equations 
are indeed common in many applications, see m- 
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4.4 Smoothing the linear states 

Algorithm [l] provides a way of simulating nonlinear state trajectories, approxi¬ 
mately distributed according to p(wi:t I Vi-.t)- However, it is often the case that 
we are also interested in smoothed estimates of the linear states {zt}- These 
estimates can be obtained by fusing the statistics from a forward conditional 
Kalman filter, with the backward statistics computed during the backward sim¬ 
ulation. Note, however, that the forward statistics need to be computed anew; 
that is, we can not simply use the statistics from the forward RBPF. The reason 
is that the statistics should be computed conditionally on the nonlinear trajec¬ 
tories simulated in the backward sweep, which are in general different from the 
trajectories simulated by the RBPF. 

Let ui.,T be a backward trajectory generated by Algorihtmj^ To compute 
the conditional smoothing PDF for zt we start by noting that 

p{zt I U 1 :T, Vi-.t) oc p{yt+i-.T, ut+i-.T \ zt, ut)p{zt \ mi,*, j/nt). (34) 


Since the model is CLG, the latter factor can be computed by running a Kalman 
filter, conditionally on the fixed nonlinear state trajectory We get, 

p{zt I ui,t, yi:t) = Af{zt; Zt\t, Pt\t), (35) 


for some mean vector Zt\t and covariance matrix respectively (cf., (lOl). 
By fusing this information with the backward information filter, given by (17), 
we get. 


with 


p{zt I uuT, yi-.r) = M{zt, zt\T, Pt\T), (36a) 

Zt\T = Pt\T ) (36b) 

^t|T= (V+a)”'. (36c) 


The resulting method can be seen as a forward-backward-forward smoother. 
First, a forward RBPF is used to filter the data. Second, a backward simulator 
is applied to simulate nonlinear state trajectories. Finally, a new forward sweep 
is carried out to compute the smoothing distributions for the linear states. The 
complete RBPS is given in Algorithm 

Similarly to above, we may also compute, for instance, the two-step smooth¬ 
ing distribution p{zt-i-.t \ Mi:T,yi:r) which is typically required when using the 
smoother for parameter estimation. 


5 Extensions and computational aspects 

5.1 Approximate Rao-Blackwellization 

As pointed out in Section an alternative to SMC is to use some deterministic 
Gaussian approximation of the filtering and smoothing distributions. This gives 
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Algorithm 2 Rao-Blackwellized particle smoother (KBPS) 

1. Forward filter/backward simulator: Run Algorithm to simulate a 
nonlinear state trajectory Ui-t- For each t = 1, ..., T, store and Aj. 

2. Linear state smoothing: 

(a) Run a Kalman filter for the linear states, conditionally on ui-^- For 

each t = 1, ..., T, store the filtered mean and covariance: Pt\t\- 

(b) Compute the smoothed means and covariances according to (|36[). 


rise to methods such as the extended and the unscented Kalman filters and 
smoothers. In m, an unscented two-filter smoother is constructed by inverting 
the dynamical model. However, as pointed out in [3], inversion of the dynamics 
will in general not lead to the correct result. Instead, [3] suggest a generalized 
two-filter smoothing formula and use this as a basis for an unscented two-filter 
smoother (see also HI)- 

It is possible to combine these methods with the proposed RBPS. This en¬ 
ables smoothing for general nonlinear state-space models, in which one part of 
the state vector is approximated using particles and the other part of the state 
vector is handled using a deterministic approximation. This hybrid approach 
can be useful when deterministic approximations are found to be appropriate 
for some state variables, but insufficient for some other variables. 

Consider the following, general nonlinear SSM, 


Xt+l 



f{ut,Zt,Vt), 


yt = h{ut,Zt,et). 


(37a) 

(37b) 


with process noise Vt JV{0,In^) and measurement noise et ~ A/'(0,/„^), re¬ 
spectively. The partitioning of the state according to Xt = (ut, Zt) is in this case 
superficial, since the model is nonlinear in both variables. However, the parti¬ 
tioning is used to indicate which part of the model that we intend to address 
using particles, and which part that we intend to address using a deterministic 
approximation. 

Approximate Rao-Blackwellized forward filtering can be done for the model 
(37) by using, for instance, an extended or an unscented RBPF |25j. These 
methods are based on different types of Gaussian approximations. Let x G 
be a Gaussian distributed random vector and let (p : —)■ K""' be some 

(nonlinear) transformation. A Gaussian approximation scheme can be used to 
find a Gaussian approximation of the random vector 2 ; = <p{x). Examples of such 
approximations are first and second order Taylor expansions, i.e. linearizations, 
and the unscented transform m- Assume that the forward filter ( [l()| holds 
approximately. We then seek a generalization of the backward information filter 
given by 0 and ([I^ to the nonlinear setting. We suggest an approach which 
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draws upon the generalized two-filter smoothing formula by nig. 

Consider first the backward prediction step (28). Let us introduce the aux¬ 
iliary quantities 


7t(zt) = 


(38) 


for some user-chosen parameters fit and Sf. In Hi, these functions are viewed 
as artificial priors. Indeed, if (38) is viewed as a prior distribution on zt, then 
(37a) is a nonlinear transformation of the Gaussian vector (ut is fixed), 


'A7 


L? 0 
0 In. 


(39) 


To exploit this, we write (28) as 
piyt+l-.T,Ut+l-.T I Zt,Ut) 


= j P{yt+l:T,Ut+2-.T\zt+l,Ut+l) 


p{xt+i,Zt \ ut) 

lt{zt) 


dz-, 


t+i- 


(40) 


with p{xt+i,zt I ut) = p{zt+i,ut+i I zt,ut)"ft{zt)- By using ([^ and applying a 
Gaussian approximation scheme to the mapping. 


( ^‘+1^ = (fi'^t,zt,vt) 


\ 


Zt 


(41) 


we get 


ap^ox. 

Zt J 


M 


(S-V 


sr 


(42) 


for some vector c* and matrices Yfi and , respectively. Here, we have made 
the nonrestrictive assumption that the Gaussian approximation scheme applied 


to the identity mapping retains the Gaussian prior. By factorizing (42) we have 


where 


p{xt+i,zt I Ut) « N{xt+i]ft + A^zt, Qt)lt{zt), 


ft =ct- Affit, 


(43a) 


(43b) 

(43c) 

(43d) 


By plugging (43) into (40), we see that the factor jtizt) cancels. We can thus 
use the approximate dynamics defined by (43) in the updating formulas for 


the backward prediction (33). To recover the notation used in (31) and (33) 


however, we need to split the quantities defined in (43) according to the two 
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state components Ut+i and Zt+ii respectively. That is, we define g*, ft, Bt, At, 
Gt and Ft through, 


£X _ 

It — 


A^ = 


Q! = 


where the latter expression is given by for instance a Cholesky factorization of 

Qt- 

The backward measurement update (26) can be handled in a similar way. 

p{yt,zt\ut) 


We write (261 as 


p{yt:T,Ut+l:T I Zt,Ut) = p{yt+V.T,Ut+l:T \ Zt,Ut)- 


lt{zt) 


(44) 


with p{yt, zt I ut) = p{yt \ Zt, Ut)"ftizt) and jtizt) being a user-chosen Gaussian 
density as in (38), possibly different from the one used in the prediction step. 
As above, with ^t{zt) interpreted as an artificial prior, (37bI is a nonlinear 
transformation of the Gaussian vector. 


'N 


0 


0 In 

By applying a Gaussian approximation scheme to the mapping, 


we get 


Ut \ api^ox. 
Zt ^ 


h{ut,zt,et] 

Zt 


( sr 
V(sr)' 




(45) 


(46) 


(47) 


for some vector dt and matrices and ^, respectively. By factorizing (471 

we have 


where 


p{yt, Zt I Ut) « A/'(yt; ht + CtZt, Rt)^t{zt), 

Ct = Er(s*)-\ 

ht — dt GtPt, 

Rt = ^i- a(Er)T. 


(48a) 

(48b) 

(48c) 

(48d) 


The above quantities can then be used in the backward measurement update 
equations (27). 

To apply the approximate KBPS as described above, we need to choose the 
artificial priors (38). In [3], it is suggested to use the actual “prior” yt[zt) = 
p{zt I Ut) (recall that Ut is fixed at this stage of the algorithm), or some approxi¬ 
mation of this density. However, it is also pointed out that the approach is more 
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generally applicable. Indeed, requiring ^t{zt) to be close to p{zt \ut) is only im¬ 
portant if we want p{xt+i,zt \ Ut) and p{yt, Zt \ut) to be close approximations to 
p{xt+i, Zt I Ut) and p{yt, Zt \ Ut), respectively. For our purposes, this is not nec¬ 
essary, since the artificial priors cancel in (40) and (44). In fact, ')t{zt) serves as 


a type of indicator for the operational range in the state-space of the Gaussian 
approximations. A more natural choice might thus be to use the current esti¬ 
mate of Zt to specify jtizt), extracted either from the forward filter of from the 
backward filter. Indeed, if the Gaussian approximation scheme is based on a 
first order Taylor expansion, then the mean pt of the “artificial prior” is simply 
the linearization point for the Taylor expansion. It is easy to check that in this 
case ( |43b[ )-( [43d| reduces to: Af = §f^(ut, pt,0), ft = f lut,Pt,0) - A^pt, and 
Qt = y‘t,0)§^iut, pt,0)~^■ Hence, in this case the results will indeed 

be independent of the covariance matrix Ej of the artificial prior in (38), and 
choosing 7 t(zt) is equivalent to choosing the linearization point pt- 


5.2 MCMC and particle rejuvenation 

The FFBS algorithm [T^ forms the basis for the proposed KBPS. It has been 
recognized that two shortcomings of this algorithm are: (i) its computational 
complexity is of order 0{NMT), which can sometimes be prohibitively large, 
and (ii) the states simulated in the backward pass are constrained to the support 
of the forward hlter particles. However, in a modification of FFBS which 
addresses both of these issues is proposed. The idea is to make use of Markov 
chain Monte Garlo (MCMC) within the backward simulator to generate the 
backward trajectories—the same technique can be used also with the proposed 
KBPS, as we discuss below. 

As before, let Ut+i-.r be a partial backward trajectory. To extend this trajec¬ 
tory to time t, instead of simulating ui-,t from the backward kernel approxima¬ 


tion (14), we draw from some MCMC kernel which leaves the backward kernel 
invariant. Following [5], we use the RBPF particles, not from time t, but from 
time t — 1, to define the MCMC proposal: 

N 

q{ui,t\ut+i:T,yi:T) = '^vl_-i^q{ut\u\.t_i,Ut+i-.T,yi-.T)S^i^_^{ui,t-i), 

where and g(-) are chosen by the user (see [S] for suggestions on 

how to select these quantities). Similarly to (13) we then factorize the target 
distribution as 


p(ui:t I Ut+ 1 :T, 2 / 1 :t) OC p{yt+l-.T, Ut+l-.T \ Ml:t, 2/l:t) 

X p{yt I Ui,t, yi:t-i)p{ut I Ui,t-i,yi-.t-i)p{ui,t-i \ yi-.t-i)- 


Using the RBPF particles at time t— 1 to approximate this distribution we obtain 
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(r) 

the Metropolis-Hastings acceptance probability for a proposed move u\.J. 

P^{Ul-.t\ut+l:T,yi:T) qiu‘i^}\ut+l:T,yi-.T) \ 




min < 1, 


q{ul.^\ut+i-.T,yi-.T) 


I Ut+ 1 :T, yi-.r) 


where 


{ul,t\ut+l:T,yi:T) _ 
q{ul.t I Ut+l,T,yi-.T) 


vt_iq{ul I ul.^,ut+i:T, yi-.r) 

X p{yt+i-.T, Ut+1:T I ult, yi,t)p{yt I <:t, yi-.t-i) 

and analogously for the second factor (here, refers to the RBPF 


particle such that uJ.j = {u\ 


n)- 


Importantly, the expression above depends on the forward RBPF particle 
system only through the proposed sample Consequently, the computa¬ 

tional complexity of simulating each individual backward trajectory is indepen¬ 
dent of the number of forward filter particles N. Hence, if we run the MCMC 
sampler for R steps at each time point we get a total computational complex¬ 
ity of order 0{RMT). As pointed out in i? can typically be chosen much 
smaller than iV, resulting in a significant reduction in computational complex¬ 
ity. Furthermore, since we simulate Uf from (the possibly continuous) proposal 
density q{-), the backward trajectories are not constrained to the support of the 
forward filter particles. 

A related technique is to use rejection sampling to simulate the backward 
trajectories, as has been proposed by [5] for the FFBS. However, this requires 
an upper bound on the backward sampling weights (15) that holds uniformly 


for all backward trajectories {ul+i.T}jLi- It is not obvious how to choose this 
bound in the Rao-Blackwellized setting, making this technique less suitable for 
the RBPS. 


5.3 Square-root implementation 


As pointed out in Section |3.2[ it is in general recommended to implement the 
conditional Kalman filter of the RBPF on square-root form, to ensure symmetry 
and positive definiteness of the involved covariance matrices. The same holds 
for the conditional backward information filter. In this section, we show how to 


implement the backward recursions given by (25), (27) and (33) on square-root 
form. 

We use the technique proposed by [TB], which is based on a numerically 
robust QR-factorization, and adapt this to the present setting. For an arbitrary 
matrix A, we can factorize it as A = QTZ, where Q is orthogonal and TZ is upper 
triangular. Let be a matrix such that Qt = and similarly for Hj. 

Rather than computing the information matrices fit and fit in the backward 
filter, we will propagate the square-roots fl^^ and fl^^. 
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Consider first the backward measurement update (27). We compute a QR- 
factorization of the matrix, 




= Q 


0 


(49) 


1/2 

Here, i?/ can be computed by a Cholesky factorization of the measurement 
noise covariance matrix Rt- It follows that TZjTZi = + CjR^^Ct, which 

implies that = TZj. 

Next, we consider the backward prediction for the hierarchical CLG model, 
given by (p^. We compute a QR-factorization of the following matrix: 


It follows that 


TZjTZ 


qT /2 p 


t+l^t+1 52t+l^t+l 


nln 


7^|7^2 
2 


TZ^TZ^ 


0 

i I 1 


= Q 


7^l 

0 




TZ2 

7^3 


Aj^^rit+iAt+i 


(50) 


(51) 


From (51) and (25), we can identify 

TZJTZi = 

RJ) = HIlI H; 


2 “ : 


tzJtz, 


■3 — A+l^t+1 


At+-i — TZXTZo — 11 * 


(52a) 

(52b) 

(52c) 


Hence, = ”^3 A* = {Aj_^_i — TZjTZi Fj_^i)mt+i. 


Similarly, we can address the backward prediction for the mixed linear/nonlinear 


model (331 by computing the QR-factorization, 


a 


0 





By similar computations as above, we get = TZj and A* = {Aj—TZjTZ^ ~^Q^Fj)mt 


\ (TZi 

7^2^ 

= Q 0 

7^3 

/ VO 

0/ 

= TZj and A* = 


(53) 


6 Experimental results 

We evaluate the proposed RBPS on two numerical examples and compare its 
performance to alternative smoothers. The following methods are considered: 

• FFBS: A non-Rao-Blackwellized FFBS |13) . 

• RB-KS: A Rao-Blackwellized Kitagawa smoother pO] . 

• RB-FF/JBS: Rao-Blackwellized forward filter/joint backward simulator pT| . 
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• RB-FFBS: The proposed method (Algorithm]^. 

For all methods, a bootstrap PF [14] or RBPF jT] |29] is used in the forward 
direction. 

The RB-KS consists of running an RBPF and storing the nonlinear state 
trajectories. Smoothed linear state estimates are then computed by running 
constrained Rauch-Tung-Striebel (RTS) smoothers [5S], conditionally on these 
nonlinear trajectories. The RB-FF/JBS is an adaptation of the “joint backward 
simulator” by which runs an RBPF in the forward direction, but samples 
(ut, Zt) jointly in the backward direction. The method relies on having access 
to the linear state samples in order to compute the backward sampling proba¬ 
bilities. In fact, the method given in El is only applicable to hierarchical CLG 
models, but we modify it to work also for mixed linear/nonlinear CLGs. Further¬ 
more, we complement the method with constrained RTS smoothing to compute 
refined smoothed linear state estimates, which makes a more fair comparison 
(indeed, this is a simple “trick” that can be used to improve the performance 

of the method by my 


6.1 Estimation of a time-varying parameter 

We consider first a 5th order mixed linear/nonlinear system. The nonlinear part 
is given by the time series, 

ut+i = O.^ut + et 2 +8cos(1.2t)-fO.OTlu^ (54a) 

yt = 0.05ut -I- et, (54b) 


for some process {9t}t>i- The case with a static 9t = 25 has been studied by, 
among others, M- Here, we assume instead that 9t is a time varying parameter 
with known dynamics, given by the output from a 4th order linear system. 


Zt+i — 


/3 

2 

0 


VO 

9t = 25+ (0 


-1.691 

0 

1 

0 


0.849 

0 

0 

0.5 


-0.3201\ 
0 
0 

0 y 


Zt + O.luf 


0.04 0.044 0.008)24, 


(55a) 

(55b) 


with poles in 0.8 ± O.li and 0.7 ± 0.05i. Gombined, (54) and (55) is a mixed 
linear/nonlinear system. The noises are assumed to be white, Gaussian and 
mutually independent; ~ A/'(0,1), uf ~ A/’(0, 1) and et ^ A/'(0,0.1). 

We generate 1 000 batches of data from the system, each with T = 100 
samples. We run the smoothers two times, first with N = 300 and then with 
N = 30 particles. The backward-simulation-based methods use M = N/3 
backward trajectories, based on the recommendation to set M < N US). Tabled 
summarizes the results, in terms of the time averaged root-mean-squared errors 
(RMSE) for the nonlinear state Ut and for the time varying parameter 9t (note 
that 9t is a linear combination of the four linear states Zt). We emphasize that 
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the RMSE values are computed with respect to the “true trajectories”, and not 
with respect to the optimal smoother (which is intractable). That is, even the 
optimal smoother would have resulted in a non-zero RMSE, and this should be 
taken into account when interpreting the results reported in the table. 


Table 1: RMSE values averaged over 1 000 runs 


Smoother 

N = 

Ut 

300 

dt 

N = 

Ut 

30 

Ot 

FFBS 

0.499 

0.782 

1.203 

1.238 

RB-KS 

0.424 

0.660 

0.980 

0.909 

RB-FF/JBS 

0.399 

0.579 

0.967 

0.869 

RB-FFBS 

0.398 

0.564 

0.965 

0.836 


The proposed RB-FFBS gives the most accurate results among the con¬ 
sidered smoothers, both for N = 300 and N = 30. The difference between 
RB-FFBS and RB-FF/JBS is quite small. However, standard statistical hy¬ 
pothesis tests indicate indeed a clear statistically significant improvement for 
RB-FFBS over RB-FF/JBS. In fact, the small difference is not surprising, since 
these two methods are similar in many respects. We discuss this further in 
Section [71 

For further comparison. Figure [l] shows the estimates of 9t for one specific 
batch of data, using N = 300 and M = 100. This reveals a clear difference 
between the methods’ abilities of accurately representing the posterior distri¬ 
bution of 6t. For FFBS and RB-KS (the top row), there is a clear degeneracy 
in the trajectories. For RB-KS, this is expected, as it is a direct effect of 
the path degeneracy of the RBPF. For the (non-Rao-Blackwellized) FFBS, the 
degeneracy is caused by the fact that N = 300 particles is insufficient to rep¬ 
resent the posterior in all five dimensions, resulting in that only a few particles 
get significantly non-zero weights. This will cause the backward simulator to 
degenerate, in the sense that many backward trajectories will coincide. The 
Rao-Blackwellized backward simulators (bottom row) perform much better in 
this respect, as there is a much larger diversity among the backward trajectories. 

6.2 Tracking with a Constant Turn Model 

Next we consider the task of tracking a manoeuvering target from noisy obser¬ 
vations. A two dimensional constant turn model is used (see jH] for details). 
This has a single nonlinear state which describes the instantaneous turn rate of 
the target, and which evolves according to a random walk, 

Ut+i=ut + v'^. (56) 

The process noise is modelled as Cauchy distributed centered at zero. The 
linear state vector comprises the position and velocity of the target in Cartesian 
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Figure 1: Estimates of for t = 1, ..., T. From top left to bottom right; 
FFBS, RB-KS, RB-FF/JBS and RB-FFBS. Each curve corresponds to one par¬ 
ticle trajectory {Oi-t for FFBS and E[0i:t I Wi:T, J/i:r] for the Rao-Blackwellized 
smoothers). The true value is shown as a thick black line. 
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coordinates. The transitions are described by the equation, 


zt+i = A{ut+i)zt + F{ut+i)v^, (57) 

with Vf ^ A/’(0, cr^). See [21] for the definitions of A{ut+i) and F{ut+i). Noisy, 
radar-style observations are made of the target range and bearing from a fixed 
point (the origin), 


yt = 


tan 


-1 


( ^^2 




+ 2 , 


i,2 




(58) 


where the observation noises in the bearing and range measurements are white, 
Gaussian and mutually independent, with variances and tr^, respectively. 
This model cannot be Rao-Blackwellized directly, but may be treated using the 
approximate method of Section |5.1[ Specifically, we use a linearization of the 


observation model (58) around the filter mean. That is, we set fit = Zt\t in (38) 
(as pointed out in Section 5.1 the resulting method is independent of the choice 
of covariance matrix in (38) when using a first order Taylor expansion). 

The algorithms were tested on one of the standard benchmark cases de¬ 
scribed in [2], with simulated observations made every second (see Figure [^. 
The spread of is set to 0.03 rad/s, the process noise standard deviation 
Uz = 10 m, and the observation noise standard deviations to cr;, = ^ and 
Ur = 100 m, respectively. 

We simulated 100 batches of observations. The same four algorithms were 
tested as for the previous model. However, the non-Rao-Blackwellized particle 
filter regularly failed to track the target with a reasonable number of particles, 
making the FFBS impractical. It was thus excluded from the results. The 
approximate RBPF used N = 100 particles and the smoothers were used to 
sample M = 100 state sequences. 

RMSE values for the smoothed state estimates are shown in Table Again, 
we emphasize that the RMSE values are computed with respect to to the true 
trajectory (Figure [^, and not with respect to to the (intractable) optimal 
smoother. We see that RB-FFBS gives the most accurate results. However, the 
real advantage of the forward-backward smoothing algorithms is the increased 
number of unique particles (shown varying over time in Figure]^, which leads to 
a better characterisation of the posterior density. We can quantify this improve¬ 
ment by calculating the estimated posterior density of the true state p{zt | J/i:t) 
for each approximation. This is plotted in Figure]^ and clearly shows the supe¬ 
rior performance of the forward-backward smoothing algorithms over RB-KS. 
Also in this respect, the RB-FFBS algorithm appears to perform slightly better 
than the RB-FF/JBS. 

The experiment was repeated with different model parameters and numbers 
of particles. Qualitatively similar results were observed. 
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Figure 2: Benchmark fighter aeroplane trajectory from [5], with simulated ob¬ 
servations (crosses). 


Table 2: RMSE values averaged over 100 runs 


Smoother 

Ut 

Zt 

RB-KS 

0.176 

438 

RB-FF/JBS 

0.152 

382 

RB-FFBS 

0.129 

370 


7 Discussion 

We have derived, within a unified framework, an KBPS for two commonly en¬ 
countered classes of conditionally linear Gaussian models; hierarchical CLG 
models and mixed linear/nonlinear GLG models, respectively. The method 
provides a solution to the offline (batch) state-inference problem. Furthermore, 
it can be combined with standard techniques, such as particle expectation max¬ 
imization ini US] and particle MGMG m to address the system identification 
problem for these model classes (see and [32] for these two approaches, re¬ 
spectively, applied to jump Markov systems). Gompared to previously proposed 
KBPS, the proposed method differ on two key aspects: (i) it does not require 
any structural approximations of the model, and (ii) it Rao-Blackwellizes the 
linear state both in the forward direction and it the backward direction. 

The second point is in contrast with the RB-FF/JBS [TT], in which both 
the nonlinear and the linear states are simulated in the backward direction. 
Numerically, we found that the RB-FF/JBS performed quite similarly to the 
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Figure 3: Average number of unique particles at each time step for RB-KS (red), 
RB-FF/JBS (blue), and RB-FFBS (green). 



Figure 4: Average posterior log-density of the true state at each time step for 
RB-KS (red), RB-FF/JBS (blue), and RB-FFBS (green). 
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fully Rao-Blackwellized smoother (although, with a clear statistically significant 
difference in favour of the proposed method). This is not that surprising, since, 
essentially, the only difference between the methods is that for RB-FF/JBS the 
backward simulation weights are random (they depend on the linear state sam¬ 
ples). This gives rise to unnecessary Monte Carlo variance which slightly dete¬ 
riorates the performance of the method. In all other respects the two smoothers 
are very similar; in particular, they make use of the same forward RBPF to ap¬ 
proximate the backward kernel. In terms of computational and implementation 
complexity they are almost identical. In fact, the RB-FF/JBS can be seen as 
an (unnecessary) approximation of the method proposed herein—this approx¬ 
imation makes the derivation, but not the implementation or execution of the 
algorithm, simpler. With this in mind we believe that the proposed RBPS in¬ 
deed is the preferred method of choice of these two smoothers. Furthermore, in 
our opinion, the proposed method makes use of a more intuitively correct Rao- 
Blackwellization, since the marginalization is done both in the forward direction 
and in the backward direction. 
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